Out-of-domain FrameNet Semantic Role Labeling
نویسندگان
چکیده
Domain dependence of NLP systems is one of the major obstacles to their application in large-scale text analysis, also restricting the applicability of FrameNet semantic role labeling (SRL) systems. Yet, current FrameNet SRL systems are still only evaluated on a single in-domain test set. For the first time, we study the domain dependence of FrameNet SRL on a wide range of benchmark sets. We create a novel test set for FrameNet SRL based on user-generated web text and find that the major bottleneck for out-of-domain FrameNet SRL is the frame identification step. To address this problem, we develop a simple, yet efficient system based on distributed word representations. Our system closely approaches the state-of-the-art in-domain while outperforming the best available frame identification system out-of-domain. We publish our system and test data for research purposes.1
منابع مشابه
Towards Open-Domain Semantic Role Labeling
Current Semantic Role Labeling technologies are based on inductive algorithms trained over large scale repositories of annotated examples. Frame-based systems currently make use of the FrameNet database but fail to show suitable generalization capabilities in out-of-domain scenarios. In this paper, a state-of-art system for frame-based SRL is extended through the encapsulation of a distribution...
متن کاملA System for Building FrameNet-like Corpus for the Biomedical Domain
Semantic Role Labeling (SRL) plays an important role in different text mining tasks. The development of SRL systems for the biomedical area is frustrated by the lack of large-scale domain specific corpora that are annotated with semantic roles. In our previous work, we proposed a method for building FramenNet-like corpus for the area using domain knowledge provided by ontologies. In this paper,...
متن کاملPreposition Disambiguation: Still a Problem
Considerable recent progress has been made in preposition disambiguation using the SemEval 2007 corpus, with results reaching accuracy of over 88 percent. However, with a new corpus of tagged instances, use of the models shows a decline in performance to around 43 percent. This suggests that recent efforts suffer from an out-of-domain problem. Detailed examination of the dimensions of this prob...
متن کاملFrame-Semantic Role Labeling with Heterogeneous Annotations
We consider the task of identifying and labeling the semantic arguments of a predicate that evokes a FrameNet frame. This task is challenging because there are only a few thousand fully annotated sentences for supervised training. Our approach augments an existing model with features derived from FrameNet and PropBank and with partially annotated exemplars from FrameNet. We observe a 4% absolut...
متن کاملShallow Semantic Parsing for Spoken Language Understanding
Most Spoken Dialog Systems are based on speech grammars and frame/slot semantics. The semantic descriptions of input utterances are usually defined ad-hoc with no ability to generalize beyond the target application domain or to learn from annotated corpora. The approach we propose in this paper exploits machine learning of frame semantics, borrowing its theoretical model from computational ling...
متن کامل